--- layout: page title: Script 2 - ggplot permalink: /scripts/script2ggplot/ parent: R Scripts nav_order: 3 --- Creating Plots with ggplot

Alongside the plotting functions we learnt about in script 2, there is a language to create prettier and more elegant data visualisations than base R: ggplot2, the grammar of graphics. You can learn all about it on this website, this free online course or this Youtube webinar.

To use ggplot2, you are going to need to install the tidyverse collection of packages, which includes, alongside ggplot2, a host of other packages with functions that are widely used to wrangle with, model and visualise data in R. As these functions simply speed up things you can still do in base R - but usually with longer lines of code. To keep it simple and avoid confusion we are not going to cover it much in this course. However, if you feel like you’ve grasped the basics of data visualisation, you may want to try your hand to reproduce the graphs from script 2’s with ggplot2 instead of base R. The code to do so is below and gives you a sense for the syntax of ggplot.



#install.packages("tidyverse") # installs the package you need
library(tidyverse) #loads the package
#load the data
qog <- read.csv("qog.csv")

Barplots

qog2 <- data.frame(table(region = qog$region)) #creates a dataframe of frequencies

ggplot(data = qog2, mapping = aes(x = region, y = Freq)) +
  geom_bar(stat = "identity") + #makes the barplot
  theme_minimal() + #removes ugly grey background
  theme(axis.text.x = element_text(angle = 90, vjust = 1)) + #rotates the x axis text
  ylab("Count") + #creates the y axis label
  ggtitle("Distribution of countries by region") #creates the title

ggplot(data = qog2, mapping = aes(x = region, y = Freq)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 1)) +
  xlab("Count") +
  ggtitle("Distribution of countries by region") +
  coord_flip() #flips x and y axes

qog3 <- data.frame(table(Status = qog$freedomhouse_status)) #creates a dataframe of frequencies

ggplot(data = qog3, mapping = aes(x = Status, y = Freq, fill = Status)) +
geom_bar(stat = "identity") +
theme_minimal() + ylab("Frequency") + xlab("Freedom House Status") +
ggtitle("Barplot")

ggplot(data = qog, mapping = aes(x = freedomhouse_status,
                                 y = human_devt_index,
                                 col = freedomhouse_status,
                                 fill = freedomhouse_status)) +
  geom_boxplot(alpha = 0.5) + theme_minimal() +
  xlab("Freedom House Rating") + ylab("Human Development Index") +
  theme(legend.position = "none") + ggtitle("Barplot")


Histogram

ggplot(data = qog, mapping = aes(x = human_devt_index)) +
  geom_histogram(binwidth = 0.05, fill = "grey",
                 col = "black", alpha = 0.2) +
#alpha (0 to 1) makes the fill more or less transparent
  theme_minimal() +
  xlab("UNDP Human Development Indicator") + ylab("Count") +
  ggtitle("Distribution of HDI") +
  theme(plot.title = element_text(hjust = 0.5)) +
#centers the plot title
  geom_vline(xintercept = median(qog$human_devt_index, na.rm=TRUE), col = "red") +
  geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE), col = "blue") +
  geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE) +
  sd(qog$human_devt_index, na.rm=TRUE), col = "blue", lty = 3) +
  geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE) -
  sd(qog$human_devt_index, na.rm=TRUE), col = "blue", lty = 3)

Single Histogram with Various Groups

You can plot a histogram by groups using the color/fill argument.

Note that the bars will be stacked - you can use the position="identity" or `position="dodge" arguments in the geom_histogram() command to avoid this but these other approaches are not ideal if you have more than two groups - as bars by group will be plotted next to each other.

ggplot(data = qog, mapping = aes(x = human_devt_index, fill= freedomhouse_status)) + 
  geom_histogram(binwidth = 0.05, alpha = 0.6) +
  theme_minimal() +
  xlab("UNDP Human Development Indicator") + ylab("Count") +
  ggtitle("Distribution of HDI by Freedomhouse") +
  theme(plot.title = element_text(hjust = 0.5)) +
#centers the plot title
  geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Free"], na.rm=TRUE), col = "red") +
  geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Partly Free"], na.rm=TRUE), col = "blue") +
  geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Not Free"], na.rm=TRUE), col="green", lty=2) 


Density Plot

ggplot(data = qog, mapping = aes(x = human_devt_index)) +
  geom_density(bw = 0.025, fill = "lightblue", alpha = 0.2) +
  #bw sets the bandwidth of the density plot
  theme_minimal() +
  xlab("UNDP Human Development Indicator") + ylab("Density") +
  ggtitle("Distribution of HDI") +
  theme(plot.title = element_text(hjust = 0.5)) + ggtitle("Density Plot")


Scatterplots

ggplot(data = qog, mapping = aes(x = polity,
                                 y = fragile_state_index)) +
  geom_point() + theme_minimal() +
  scale_color_manual(values = c("black", "red")) +
  geom_point(data = subset(qog, country %in% c("Italy", "Greece")), col = "red",
  shape = 17, size = 3) +
  ylab("Fragile State Index") + xlab("Polity") + ggtitle("Scatterplot")

ggplot(data = qog, mapping = aes(x = polity,
  y = fragile_state_index, label = iso3c, col = freedomhouse_status)) +
  geom_text(size = 3) + theme_minimal() +
  scale_color_manual(values = c("royalblue", "tomato", "violet"),
                     labels = c("Free", "Not Free", "Partly Free"),
                     name = "Freedom House Rating") +
  ylab("Fragile State Index") +
  xlab("Polity") + ggtitle("Scatterplot with Text Labels")